AEGD: adaptive gradient descent with energy
نویسندگان
چکیده
We propose AEGD, a new algorithm for optimization of non-convex objective functions, based on dynamically updated 'energy' variable. The method is shown to be unconditionally energy stable, irrespective the base step size. prove energy-dependent convergence rates AEGD both and convex objectives, which suitably small size recovers desired batch gradient descent. also provide an bound stationary in stochastic setting. straightforward implement requires little tuning hyper-parameters. Experimental results demonstrate that works well large variety problems. Specifically, it robust with respect initial data, capable making rapid progress. shows comparable often better generalization performance than SGD momentum deep neural networks. code available at https://github.com/txping/AEGD.
منابع مشابه
Adaptive Online Gradient Descent
We study the rates of growth of the regret in online convex optimization. First, we show that a simple extension of the algorithm of Hazan et al eliminates the need for a priori knowledge of the lower bound on the second derivatives of the observed functions. We then provide an algorithm, Adaptive Online Gradient Descent, which interpolates between the results of Zinkevich for linear functions ...
متن کاملAdaptive Variance Reducing for Stochastic Gradient Descent
Variance Reducing (VR) stochastic methods are fast-converging alternatives to the classical Stochastic Gradient Descent (SGD) for solving large-scale regularized finite sum problems, especially when a highly accurate solution is required. One critical step in VR is the function sampling. State-of-the-art VR algorithms such as SVRG and SAGA, employ either Uniform Probability (UP) or Importance P...
متن کاملAdaptive wavefront control with asynchronous stochastic parallel gradient descent clusters.
A scalable adaptive optics (AO) control system architecture composed of asynchronous control clusters based on the stochastic parallel gradient descent (SPGD) optimization technique is discussed. It is shown that subdivision of the control channels into asynchronous SPGD clusters improves the AO system performance by better utilizing individual and/or group characteristics of adaptive system co...
متن کاملStochastic Gradient Descent with GPGPU
We show how to optimize a Support Vector Machine and a predictor for Collaborative Filtering with Stochastic Gradient Descent on the GPU, achieving 1.66 to 6-times accelerations compared to a CPUbased implementation. The reference implementations are the Support Vector Machine by Bottou and the BRISMF predictor from the Netflix Prices winning team. Our main idea is to create a hash function of ...
متن کاملLearning to learn by gradient descent by gradient descent
The move from hand-designed features to learned features in machine learning has been wildly successful. In spite of this, optimization algorithms are still designed by hand. In this paper we show how the design of an optimization algorithm can be cast as a learning problem, allowing the algorithm to learn to exploit structure in the problems of interest in an automatic way. Our learned algorit...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Numerical Algebra, Control and Optimization
سال: 2023
ISSN: ['2155-3297', '2155-3289']
DOI: https://doi.org/10.3934/naco.2023015